Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof
نویسندگان
چکیده
The ability to consistently handle faults in a distributed environment requires, among a small set of basic routines, an agreement algorithm allowing surviving entities to reach a consensual decision between a bounded set of volatile resources. This paper presents an algorithm that implements an Early Returning Agreement (ERA) in pseudo-synchronous systems, which optimistically allows a process to resume its activity while guaranteeing strong progress. We prove the correctness of our ERA algorithm, and expose its logarithmic behavior, which is an extremely desirable property for any algorithm which targets future exascale platforms. We detail a practical implementation of this consensus algorithm in the context of an MPI library, and evaluate both its efficiency and scalability through a set of benchmarks and two fault tolerant scientific applications.
منابع مشابه
Formal Verification of Consensus Algorithms Tolerating Malicious Faults
Consensus is the paradigmatic problem in fault-tolerant distributed computing: it requires network nodes that communicate by message passing to agree on common value even in the presence of (benign or malicious) faults. Several algorithms for solving Consensus exist, but few of them have been rigorously verified, much less so formally. The Heard-Of model proposes a simple, unifying framework fo...
متن کاملA Consensus Algorithm for Synchronous Distributed Systems using Mobile Agent
In this paper, we present a consensus algorithm for synchronous distributed systems using cooperating mobile agents. The algorithm is designed within a framework for mobile agent enabled distributed server groups (MADSG), where cooperating mobile agents are used to achieve coordination among the servers. Being autonomous and cooperative, cooperating mobile agents exchange information among them...
متن کاملFormalization and Correctness of the PALS Pattern for Asynchronous Real-Time Systems
Due to physical requirements, what in essence and at a higher level of abstraction is a logically synchronous real-time system has to be often realized as a distributed, asynchronous system. Getting asynchronous real-time systems right is a very error prone and labor-intensive task. The Physically Asynchronous Logically Synchronous (PALS) architectural pattern can greatly reduce the design and ...
متن کاملA Case Study of Agreement Problems in Distributed Systems: Non-Blocking Atomic Commitment
This paper considers an agreement problem whose practical interest is well known, namely the Non-Blocking Atomic Commitment Problem. First, a generic protocol solving this problem is given and then instantiations of its generic statements are provided for both synchronous and asynchronous distributed systems. These instantiations use a few basic components: timeout mechanism and reliable multic...
متن کاملAnonymous Byzantine Consensus from Moderately-Hard Puzzles: A Model for Bitcoin
We present a formal model of synchronous processes without distinct identifiers (i.e., anonymous processes) that communicate using one-way public broadcasts. Our main contribution is a proof that the Bitcoin protocol achieves consensus in this model, except for a negligible probability, when Byzantine faults make up less than half the network. The protocol is scalable, since the running time an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015